9 research outputs found
A PARTAN-Accelerated Frank-Wolfe Algorithm for Large-Scale SVM Classification
Frank-Wolfe algorithms have recently regained the attention of the Machine
Learning community. Their solid theoretical properties and sparsity guarantees
make them a suitable choice for a wide range of problems in this field. In
addition, several variants of the basic procedure exist that improve its
theoretical properties and practical performance. In this paper, we investigate
the application of some of these techniques to Machine Learning, focusing in
particular on a Parallel Tangent (PARTAN) variant of the FW algorithm that has
not been previously suggested or studied for this type of problems. We provide
experiments both in a standard setting and using a stochastic speed-up
technique, showing that the considered algorithms obtain promising results on
several medium and large-scale benchmark datasets for SVM classification
Training Support Vector Machines Using Frank-Wolfe Optimization Methods
Training a Support Vector Machine (SVM) requires the solution of a quadratic
programming problem (QP) whose computational complexity becomes prohibitively
expensive for large scale datasets. Traditional optimization methods cannot be
directly applied in these cases, mainly due to memory restrictions.
By adopting a slightly different objective function and under mild conditions
on the kernel used within the model, efficient algorithms to train SVMs have
been devised under the name of Core Vector Machines (CVMs). This framework
exploits the equivalence of the resulting learning problem with the task of
building a Minimal Enclosing Ball (MEB) problem in a feature space, where data
is implicitly embedded by a kernel function.
In this paper, we improve on the CVM approach by proposing two novel methods
to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast
method to approximate the solution of a MEB problem. In contrast to CVMs, our
algorithms do not require to compute the solutions of a sequence of
increasingly complex QPs and are defined by using only analytic optimization
steps. Experiments on a large collection of datasets show that our methods
scale better than CVMs in most cases, sometimes at the price of a slightly
lower accuracy. As CVMs, the proposed methods can be easily extended to machine
learning problems other than binary classification. However, effective
classifiers are also obtained using kernels which do not satisfy the condition
required by CVMs and can thus be used for a wider set of problems
A Novel Frank-Wolfe Algorithm. Analysis and Applications to Large-Scale SVM Training
Recently, there has been a renewed interest in the machine learning community
for variants of a sparse greedy approximation procedure for concave
optimization known as {the Frank-Wolfe (FW) method}. In particular, this
procedure has been successfully applied to train large-scale instances of
non-linear Support Vector Machines (SVMs). Specializing FW to SVM training has
allowed to obtain efficient algorithms but also important theoretical results,
including convergence analysis of training algorithms and new characterizations
of model sparsity.
In this paper, we present and analyze a novel variant of the FW method based
on a new way to perform away steps, a classic strategy used to accelerate the
convergence of the basic FW procedure. Our formulation and analysis is focused
on a general concave maximization problem on the simplex. However, the
specialization of our algorithm to quadratic forms is strongly related to some
classic methods in computational geometry, namely the Gilbert and MDM
algorithms.
On the theoretical side, we demonstrate that the method matches the
guarantees in terms of convergence rate and number of iterations obtained by
using classic away steps. In particular, the method enjoys a linear rate of
convergence, a result that has been recently proved for MDM on quadratic forms.
On the practical side, we provide experiments on several classification
datasets, and evaluate the results using statistical tests. Experiments show
that our method is faster than the FW method with classic away steps, and works
well even in the cases in which classic away steps slow down the algorithm.
Furthermore, these improvements are obtained without sacrificing the predictive
accuracy of the obtained SVM model.Comment: REVISED VERSION (October 2013) -- Title and abstract have been
revised. Section 5 was added. Some proofs have been summarized (full-length
proofs available in the previous version
An Attention-Based Architecture for Hierarchical Classification With CNNs
Branch Convolutional Neural Nets have become a popular approach for hierarchical classification in computer vision and other areas. Unfortunately, these models often led to hierarchical inconsistency: predictions for the different hierarchy levels do not necessarily respect the class-subclass constraints imposed by the hierarchy. Several architectures to connect the branches have arisen to overcome this limitation. In this paper, we propose a more straightforward and flexible method: let the neural net decide how these branches must be connected. We achieve this by formulating an attention mechanism that dynamically determines how branches influence each other during training and inference. Experiments on image classification benchmarks show that the proposed method can outperform state-of-the-art models in terms of hierarchical performance metrics and consistency. Furthermore, although sometimes we found a slightly lower performance at the deeper level of the hierarchy, the model predicts much more accurately the ground-truth path between a concept and its ancestors in the hierarchy. This result suggests that the model does learn not only local class memberships but also hierarchical dependencies between concepts